Picture for Qi Liu

Qi Liu

Xidian University

The Flip Side of RLHF: On-Policy Feedback for Reward Model Self-Supervised Improvement

Add code
May 29, 2026
Viaarxiv icon

Entropy-KL Divergence-based Token Masking: A Novel Approach for Selective Fine-tuning of Large Language Models

Add code
May 28, 2026
Viaarxiv icon

UnityMAS-O: A General RL Optimization Framework for LLM-Based Multi-Agent Systems

Add code
May 26, 2026
Viaarxiv icon

What Really Improves Mathematical Reasoning: Structured Reasoning Signals Beyond Pure Code

Add code
May 19, 2026
Viaarxiv icon

More Edits, More Stable: Understanding the Lifelong Normalization in Sequential Model Editing

Add code
May 12, 2026
Viaarxiv icon

Polyphonia: Zero-Shot Timbre Transfer in Polyphonic Music with Acoustic-Informed Attention Calibration

Add code
May 11, 2026
Viaarxiv icon

MaMi-HOI: Harmonizing Global Kinematics and Local Geometry for Human-Object Interaction Generation

Add code
May 07, 2026
Viaarxiv icon

RobotEQ: Transitioning from Passive Intelligence to Active Intelligence in Embodied AI

Add code
May 07, 2026
Viaarxiv icon

GeoDecider: A Coarse-to-Fine Agentic Workflow for Explainable Lithology Classification

Add code
May 05, 2026
Viaarxiv icon

Perceptual Flow Network for Visually Grounded Reasoning

Add code
May 04, 2026
Viaarxiv icon